Frequent Items Mining Algorithm Over High Speed Network Flows Based on Double Hash Method

نویسندگان

  • Lei Bai
  • Chao Chen
چکیده

In the high-speed backbone network, with the increasing speed of network link, the number of network flows increase rapidly. Meanwhile, with restrictions on hardware computing and storage resources, so, how to identify and measure large flows timely and accurately in massive data become a hot issue in high speed network flow measurement area. In this paper, we propose a new algorithm based on double hash algorithm to realize large flow frequent items identification, according to the defect of MF algorithm which produces false positive easily and frequent updates to bring the huge pressure to the system. The complexity and false positive rate of the algorithm was analyzed. The effect of large flow frequent items statistical accuracy and discard rate for parameter configuration was analyzed through simulation. The theoretical analysis and the simulation result indicate that compare to MF algorithm, our algorithm can identify large flow frequent items more accurately, and satisfies the need of actual measurement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HASH-MINE: A New Framework for Discovery of Frequent Itemsets

Discovery of frequently occurring subsets of items, called itemsets, is the core of many data mining methods. Most of the previous studies adopt Apriori-like algorithms, which iteratively generate candidate itemsets and check their occurrence frequencies in the database. These approaches suffer from serious costs of repeated passes over the analyzed database. To address this problem, we propose...

متن کامل

HASH-MINE: A New Frameword for Discovery of Frequent Itemsets

Discovery of frequently occurring subsets of items, called itemsets, is the core of many data mining methods. Most of the previous studies adopt Apriori-like algorithms, which iteratively generate candidate itemsets and check their occurrence frequencies in the database. These approaches suffer from serious costs of repeated passes over the analyzed database. To address this problem, we propose...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

An Efficient Association Rule Mining Using the H-BIT Array Hashing Algorithm

Association Rule Mining (ARM) finds the interesting relationship between presences of various items in a given database. Apriori is the traditional algorithm for learning association rules. However, it is affected by number of database scan and higher generation of candidate itemsets. Each level of candidate itemsets requires separate memory locations. Hash Based Frequent Itemsets Quadratic Pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017